424 research outputs found
Effective Genetic Risk Prediction Using Mixed Models
To date, efforts to produce high-quality polygenic risk scores from
genome-wide studies of common disease have focused on estimating and
aggregating the effects of multiple SNPs. Here we propose a novel statistical
approach for genetic risk prediction, based on random and mixed effects models.
Our approach (termed GeRSI) circumvents the need to estimate the effect sizes
of numerous SNPs by treating these effects as random, producing predictions
which are consistently superior to current state of the art, as we demonstrate
in extensive simulation. When applying GeRSI to seven phenotypes from the WTCCC
study, we confirm that the use of random effects is most beneficial for
diseases that are known to be highly polygenic: hypertension (HT) and bipolar
disorder (BD). For HT, there are no significant associations in the WTCCC data.
The best existing model yields an AUC of 54%, while GeRSI improves it to 59%.
For BD, using GeRSI improves the AUC from 55% to 62%. For individuals ranked at
the top 10% of BD risk predictions, using GeRSI substantially increases the BD
relative risk from 1.4 to 2.5.Comment: main text: 14 pages, 3 figures. Supplementary text: 16 pages, 21
figure
A method for generating realistic correlation matrices
Simulating sample correlation matrices is important in many areas of
statistics. Approaches such as generating Gaussian data and finding their
sample correlation matrix or generating random uniform deviates as
pairwise correlations both have drawbacks. We develop an algorithm for adding
noise, in a highly controlled manner, to general correlation matrices. In many
instances, our method yields results which are superior to those obtained by
simply simulating Gaussian data. Moreover, we demonstrate how our general
algorithm can be tailored to a number of different correlation models. Using
our results with a few different applications, we show that simulating
correlation matrices can help assess statistical methodology.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS638 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An Evolutionary Perspective of Animal MicroRNAs and Their Targets
MicroRNAs (miRNAs) are short noncoding RNAs that regulate gene expression through translational inhibition or mRNA degradation by binding to sequences on the target mRNA. miRNA regulation appears to be the most abundant mode of posttranscriptional regulation affecting âŒ50% of the transcriptome. miRNA genes are often clustered and/or located in introns, and each targets a variable and often large number of mRNAs. Here we discuss the genomic architecture of animal miRNA genes and their evolving interaction with their target mRNAs
Measuring missing heritability: Inferring the contribution of common variants
Genome-wide association studies (GWASs), also called common variant association studies (CVASs), have uncovered thousands of genetic variants associated with hundreds of diseases. However, the variants that reach statistical significance typically explain only a small fraction of the heritability. One explanation for the âmissing heritabilityâ is that there are many additional disease-associated common variants whose effects are too small to detect with current sample sizes. It therefore is useful to have methods to quantify the heritability due to common variation, without having to identify all causal variants. Recent studies applied restricted maximum likelihood (REML) estimation to caseâcontrol studies for diseases. Here, we show that REML considerably underestimates the fraction of heritability due to common variation in this setting. The degree of underestimation increases with the rarity of disease, the heritability of the disease, and the size of the sample. Instead, we develop a general framework for heritability estimation, called phenotype correlationâgenotype correlation (PCGC) regression, which generalizes the well-known HasemanâElston regression method. We show that PCGC regression yields unbiased estimates. Applying PCGC regression to six diseases, we estimate the proportion of the phenotypic variance due to common variants to range from 25% to 56% and the proportion of heritability due to common variants from 41% to 68% (mean 60%). These results suggest that common variants may explain at least half the heritability for many diseases. PCGC regression also is readily applicable to other settings, including analyzing extreme-phenotype studies and adjusting for covariates such as sex, age, and population structure.National Institutes of Health (U.S.) (NIH HG003067)Broad Institute of MIT and Harvar
EST2Prot: Mapping EST sequences to proteins
BACKGROUND: EST libraries are used in various biological studies, from microarray experiments to proteomic and genetic screens. These libraries usually contain many uncharacterized ESTs that are typically ignored since they cannot be mapped to known genes. Consequently, new discoveries are possibly overlooked. RESULTS: We describe a system (EST2Prot) that uses multiple elements to map EST sequences to their corresponding protein products. EST2Prot uses UniGene clusters, substring analysis, information about protein coding regions in existing DNA sequences and protein database searches to detect protein products related to a query EST sequence. Gene Ontology terms, Swiss-Prot keywords, and protein similarity data are used to map the ESTs to functional descriptors. CONCLUSION: EST2Prot extends and significantly enriches the popular UniGene mapping by utilizing multiple relations between known biological entities. It produces a mapping between ESTs and proteins in real-time through a simple web-interface. The system is part of the Biozon database and is accessible at
Demystifying the Adversarial Robustness of Random Transformation Defenses
Neural networks' lack of robustness against attacks raises concerns in
security-sensitive settings such as autonomous vehicles. While many
countermeasures may look promising, only a few withstand rigorous evaluation.
Defenses using random transformations (RT) have shown impressive results,
particularly BaRT (Raff et al., 2019) on ImageNet. However, this type of
defense has not been rigorously evaluated, leaving its robustness properties
poorly understood. Their stochastic properties make evaluation more challenging
and render many proposed attacks on deterministic models inapplicable. First,
we show that the BPDA attack (Athalye et al., 2018a) used in BaRT's evaluation
is ineffective and likely overestimates its robustness. We then attempt to
construct the strongest possible RT defense through the informed selection of
transformations and Bayesian optimization for tuning their parameters.
Furthermore, we create the strongest possible attack to evaluate our RT
defense. Our new attack vastly outperforms the baseline, reducing the accuracy
by 83% compared to the 19% reduction by the commonly used EoT attack
( improvement). Our result indicates that the RT defense on the
Imagenette dataset (a ten-class subset of ImageNet) is not robust against
adversarial examples. Extending the study further, we use our new attack to
adversarially train RT defense (called AdvRT), resulting in a large robustness
gain. Code is available at
https://github.com/wagner-group/demystify-random-transform.Comment: ICML 2022 (short presentation), AAAI 2022 AdvML Workshop (best paper,
oral presentation
- âŠ